library(geiger) # this is for constructing the phylogenetic ANOVA, includes phytools
## Loading required package: ape
## Loading required package: phytools
## Loading required package: maps
library(phytools)
library(ape) # geiger includes ape, but ape is needed for constructing phylogenetic trees
library(tidyverse) # self-explanatory, useful functions and ggplot() is helpful
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ purrr::map() masks maps::map()
## ✖ dplyr::where() masks ape::where()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggimage) # necessary for including images of hominin cranium in plot
library(knitr) # kable tables
library(ggtree) # get a phylogenetic tree with nodes, note must install via BiocManager::install()
## Registered S3 methods overwritten by 'treeio':
## method from
## MRCA.phylo tidytree
## MRCA.treedata tidytree
## Nnode.treedata tidytree
## Ntip.treedata tidytree
## ancestor.phylo tidytree
## ancestor.treedata tidytree
## child.phylo tidytree
## child.treedata tidytree
## full_join.phylo tidytree
## full_join.treedata tidytree
## groupClade.phylo tidytree
## groupClade.treedata tidytree
## groupOTU.phylo tidytree
## groupOTU.treedata tidytree
## inner_join.phylo tidytree
## inner_join.treedata tidytree
## is.rooted.treedata tidytree
## nodeid.phylo tidytree
## nodeid.treedata tidytree
## nodelab.phylo tidytree
## nodelab.treedata tidytree
## offspring.phylo tidytree
## offspring.treedata tidytree
## parent.phylo tidytree
## parent.treedata tidytree
## root.treedata tidytree
## rootnode.phylo tidytree
## sibling.phylo tidytree
## ggtree v3.8.2 For help: https://yulab-smu.top/treedata-book/
##
## If you use the ggtree package suite in published research, please cite
## the appropriate paper(s):
##
## Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam.
## ggtree: an R package for visualization and annotation of phylogenetic
## trees with their covariates and other associated data. Methods in
## Ecology and Evolution. 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
##
## Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods
## for mapping and visualizing associated data on phylogeny using ggtree.
## Molecular Biology and Evolution. 2018, 35(12):3041-3043.
## doi:10.1093/molbev/msy194
##
## S Xu, Z Dai, P Guo, X Fu, S Liu, L Zhou, W Tang, T Feng, M Chen, L
## Zhan, T Wu, E Hu, Y Jiang, X Bo, G Yu. ggtreeExtra: Compact
## visualization of richly annotated phylogenetic data. Molecular Biology
## and Evolution. 2021, 38(9):4039-4042. doi: 10.1093/molbev/msab166
##
## Attaching package: 'ggtree'
##
## The following object is masked from 'package:tidyr':
##
## expand
##
## The following object is masked from 'package:ape':
##
## rotate
Biologists and paleontologists have long been interested
genotype:phenotype (G:P) mapped traits; as in, phenotypic traits that
have a clear, albeit likely pleiotropic and complex, genetic
underpinning to them. It has been proposed G:P traits have use in
elucidating phylogenetic trends across taxa. The authors of this paper,
using G:P mapped dental traits test two hypotheses concerning the
utility of G:P mapped traits.
Hypothesis 1: G:P-mapped dental traits can provide evidence of
phylogenetic history and selection, and therefore, are useful in
paleontological investigations.
Hypothesis 2: G:P-mapped traits reveal a range of morphological
variation that cannot be predicted solely through extant
variation.
The goals of this paper are thus to show, in primates, that not only are
G:P mapped dental traits useful in paleontology, but G:P mapped traits
of fossil genera are necessary for understanding extant variation.
The authors measure anterior and posterior molar width, as well as
molar length. From these measurements, because anterior and posterior
widths differ, I calculated molar areas using a trapezoidal area
formaula. In terms of G:P mapped dental traits, molar module component
ratio (MMC) is measured as length(M2)/length(P4), premolar-molar module
ratio (PMM) is is measured as length(M3)/length(M1), and inhibitory
cascade (IC) is measured as area(M3)/area(M1).
IC is correlated with MMC, but is not a G:P mapped trait in primates.
Nevertheless, it is included in summary statistics and model values.
I have replicated summary descriptive statistics of molar areas and
G:P mapped traits (including IC). For my model, I have used
geiger to create phylogenetic ANOVAs of molar areas and G:P
mapped traits. Lastly, for my plot, I used ggplot to create
boxplots of G:P mapped traits across extant primates, alongside trait
values for fossil primates and estimated ancestral state reconstruction
values.
Only some of the data used in this analysis was available as open access
through dryad. This dataset contains dental measurements for the
following species: Presbytis rubicunda, Presbytis
melapholos, Papio hamadryas, Macaca fascicularis,
Colobus guereza, and Cercopithecus mitis. Fossil
genera alongside other extant primates (including some samples of
species included in the available dataset) were not available; papers
using those data are included in the Appendix. Thus, the goals of this
replication are to illustrate whether this reduced data set creates
results that are consistent with the full data set.
Available data for replication comes from: https://datadryad.org/stash/dataset/doi:10.5061/dryad.693j8
using the txt file all_linear_data.txt. I then converted
this data into a .csv file separating each data entry by space.
dental_measurements <- read_csv("all_linear_data.csv") %>%
mutate(genus = ifelse(genus_species == "pres_rub" | # information is grouped by genus for ANOVA and summary
genus_species == "pres_mel", "Presbytis", genus_species)) %>%
mutate(genus = ifelse(genus_species == "pap_ham", "Papio", genus)) %>%
mutate(genus = ifelse(genus_species == "mac_fas", "Macaca", genus)) %>%
mutate(genus = ifelse(genus_species == "col_guer", "Colobus", genus)) %>%
mutate(genus = ifelse(genus_species == "cercop_mit", "Cercopithecus", genus)) %>%
# renaming genus_species to be the full name
mutate(species = ifelse(genus_species == "pres_rub", "Presbytis rubicunda", genus_species)) %>%
mutate(species = ifelse(genus_species == "pres_mel", "Presbytis melapholos", species)) %>%
mutate(species = ifelse(genus_species == "pap_ham", "Papio hamadryas", species)) %>%
mutate(species = ifelse(genus_species == "mac_fas", "Macaca fascicularis", species)) %>%
mutate(species = ifelse(genus_species == "col_guer", "Colobus guereza", species)) %>%
mutate(species = ifelse(genus_species == "cercop_mit", "Cercopithecus mitis", species)) %>%
filter(is.na(species) == F) %>% # remove NAs from data
mutate(tribe = ifelse(genus == "Cercopithecus", # ggplot is grouped by tribe
"Cercopithicini", genus)) %>%
mutate(tribe = ifelse(genus == "Papio" | genus == "Macaca",
"Papionini", tribe)) %>%
mutate(tribe = ifelse(genus == "Colobus" | genus == "Presbytis",
"Colobinae", tribe))
## Rows: 609 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): genus_species, Museum, MuseumID, Sex
## dbl (19): I1LL, I1MD, I2LL, I2MD, CMD, CBL, P3MD, P3BL, P4MD, P4BL, M1L, M1A...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dental_measurements$tribe <- factor(dental_measurements$tribe, levels = c("Cercopithicini", "Papionini", "Colobinae"))
head(dental_measurements) # snapshot of the data frame
## # A tibble: 6 × 26
## genus_species Museum MuseumID Sex I1LL I1MD I2LL I2MD CMD CBL P3MD
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 pres_rub AMNH 103400 M 4.25 4.16 4.38 3.97 5.79 4.29 4.35
## 2 pres_rub AMNH 103624 M 4.2 4.38 4.01 3.64 5.89 4.26 4.13
## 3 pres_rub AMNH 103625 F 4.36 4.44 4.37 4.01 5.29 4.61 4.14
## 4 pres_rub AMNH 103626 F 4.1 3.89 4.07 3.83 5.22 4.25 3.85
## 5 pres_rub AMNH 103627 F 4.22 3.77 4.1 3.54 5.34 4.5 3.89
## 6 pres_rub AMNH 103628 M 3.96 4.26 4.14 3.75 5.85 4.57 4.16
## # ℹ 15 more variables: P3BL <dbl>, P4MD <dbl>, P4BL <dbl>, M1L <dbl>,
## # M1AW <dbl>, M1PW <dbl>, M2L <dbl>, M2AW <dbl>, M2PW <dbl>, M3L <dbl>,
## # M3AW <dbl>, M3PW <dbl>, genus <chr>, species <chr>, tribe <fct>
These are the tables I intend to recreate:
We first must create functions to calculate molar areas, alongside the genotype:phenotype mapped traits (again, MMC = molar module component ratio, PMM = premolar-molar module ratio, IC = inhibitory cascade trait).
trapezoid_area <- function(anterior, posterior, length){ # trapezoidal area is how calculations are done
area = (anterior + posterior) * length/2
return(area)
}
mmc_fn <- function(m3l, m1l){ # this is the MMC function
mmc = m3l/m1l
return(mmc)
}
ic_fn <- function(m3area, m1area){ # this is the IC function
ic = m3area/m1area
return(ic)
}
pmm_fn <- function(m2l, p4l){ # this is the PMM function
pmm = m2l/p4l
return(pmm)
}
Using the available data, I’ve created a table containing summary statistics for molar areas.
m1area <- as.data.frame(trapezoid_area(anterior = dental_measurements$M1AW, # we use the trapezoid function
dental_measurements$M1PW,
dental_measurements$M1L))
m1area <- cbind(dental_measurements$genus, dental_measurements$genus_species, m1area)
colnames(m1area) <- c("genus", "species", "areaM1")
m1areasummary <- m1area %>% filter(is.na(areaM1) == F) %>% group_by(genus) %>%
summarise(n = n(), avg_areaM1 = mean(areaM1), sd_areaM1 = sd(areaM1))
# repeat for M2
m2area <- as.data.frame(trapezoid_area(dental_measurements$M2AW,
dental_measurements$M2PW,
dental_measurements$M2L))
m2area <- cbind(dental_measurements$genus, dental_measurements$genus_species, m2area)
colnames(m2area) <- c("genus", "species", "areaM2")
m2areasummary <- m2area %>% filter(is.na(areaM2) == F) %>% group_by(genus) %>%
summarise(avg_areaM2 = mean(areaM2), sd_areaM2 = sd(areaM2), n = n())
# repeat again with M3
m3area <- as.data.frame(trapezoid_area(dental_measurements$M3AW,
dental_measurements$M3PW,
dental_measurements$M3L))
m3area <- cbind(dental_measurements$genus, dental_measurements$genus_species, m3area)
colnames(m3area) <- c("genus", "species", "areaM3")
m3areasummary <- m3area %>% filter(is.na(areaM3) == F) %>% group_by(genus) %>%
summarise(avg_areaM3 = mean(areaM3), sd_areaM3 = sd(areaM3), n = n())
molar_areas <- mutate(m1areasummary, avg_areaM2 = m2areasummary$avg_areaM2, sd_areaM2 = m2areasummary$sd_areaM2,
avg_areaM3 = m3areasummary$avg_areaM3, sd_areaM3 = m3areasummary$sd_areaM3) # create summary table
kable(molar_areas,
caption = "Summary statistics for two-dimensional area traits")
| genus | n | avg_areaM1 | sd_areaM1 | avg_areaM2 | sd_areaM2 | avg_areaM3 | sd_areaM3 |
|---|---|---|---|---|---|---|---|
| Cercopithecus | 70 | 32.38463 | 3.179153 | 39.36542 | 4.506180 | 29.25969 | 4.409735 |
| Colobus | 102 | 43.99904 | 4.829605 | 53.28384 | 6.020502 | 49.72497 | 6.099159 |
| Macaca | 98 | 39.41465 | 7.818815 | 51.02325 | 10.727690 | 45.97198 | 10.030113 |
| Papio | 80 | 156.13894 | 22.258208 | 219.84220 | 32.591485 | 224.25444 | 37.718486 |
| Presbytis | 145 | 31.42862 | 2.297559 | 32.76888 | 2.674768 | 28.64481 | 2.876546 |
Comparing against the published data:
For all genera, all molar area summary statistics are less than the published data; however, there are no changes in general trends in the dataset.
I then did the same for genotype:phenotype mapped traits.
# MMC
MMC <- as.data.frame(mmc_fn(dental_measurements$M3L, dental_measurements$M1L))
MMC <- cbind(dental_measurements$genus, dental_measurements$genus_species, MMC)
colnames(MMC) <- c("genus", "species", "MMCvalue")
# add to the dental_measurements df
dental_measurements <- cbind(dental_measurements, MMC$MMCvalue) %>% rename("MMC" = "MMC$MMCvalue")
# summary statistics
MMCsummary <- MMC %>% filter(is.na(MMCvalue) == F) %>% group_by(genus) %>%
summarise( n = n(), avgMMC = mean(MMCvalue), sdMMC = sd(MMCvalue))
# PMM
PMM <- as.data.frame(pmm_fn(dental_measurements$M2L, dental_measurements$P4BL))
PMM <- cbind(dental_measurements$genus, dental_measurements$genus_species, PMM)
colnames(PMM) <- c("genus", "species", "PMMvalue")
# add to the dental measurements df
dental_measurements <- dental_measurements %>% mutate(PMM =
PMM$PMMvalue)
# repeat
PMMsummary <- PMM %>% filter(is.na(PMMvalue) == F) %>% group_by(genus) %>%
summarise(avgPMM = mean(PMMvalue), sd = sd(PMMvalue), n = n())
# IC
IC <- as.data.frame(ic_fn(m3area$areaM3, m1area$areaM1))
IC <- cbind(dental_measurements$genus, dental_measurements$genus_species, IC)
colnames(IC) <- c("genus", "species", "IC")
# not included in ggplot, no need to readd to df
ICsummary <- IC %>% filter(is.na(IC) == F) %>% group_by(genus) %>%
summarise(avgIC = mean(IC), sd = sd(IC), n = n())
genotype_phenotype_traits <- mutate(MMCsummary, avgPMM = PMMsummary$avgPMM, sdPMM = PMMsummary$sd,
avgIC = ICsummary$avgIC, sdIC = ICsummary$sd)
kable(genotype_phenotype_traits,
caption = "Summary statistics for Genotype:Phenotype mapped traits")
| genus | n | avgMMC | sdMMC | avgPMM | sdPMM | avgIC | sdIC |
|---|---|---|---|---|---|---|---|
| Cercopithecus | 82 | 0.9482998 | 0.0595946 | 1.362367 | 0.0638758 | 0.8915207 | 0.0920864 |
| Colobus | 118 | 1.0771410 | 0.0576557 | 1.129409 | 0.0668292 | 1.1315981 | 0.1015021 |
| Macaca | 80 | 1.1171366 | 0.0820211 | 1.296755 | 0.0902814 | 1.1857692 | 0.1448390 |
| Papio | 86 | 1.2227169 | 0.0696412 | 1.778159 | 0.0887659 | 1.4007564 | 0.1214081 |
| Presbytis | 151 | 0.9682343 | 0.0537147 | 1.034133 | 0.0587841 | 0.9126498 | 0.0803741 |
Comparing against the published data:
General trends across G:P mapped
traits are consistent across these genera for both datasets although
there are some differences.
Presbytis: Average and standard deviation for MMC are identical
to published data (sample is the same, confirms methods work); standard
deviation for IC is identical as well. Average IC and standard deviation
for MMC are less than published data. Average PMM is much less than in
published data.
Papio: Average and standard deviation for MMC, alongside
standard deviation for PMM are identical to published data. Average PMM
is greater than published data; and average IC and standard deviation
for IC are less.
Macaca: Average and standard deviation for MMC are identical to
published data. Standard deviations for PMM and IC are identical to
published data. Average PMM and IC are much less than published
data.
Colobus: Average and standard deviation for MMC are identical
to published data (sample is the same, confirms methods work). Standard
deviations for PMM and IC are identical to published despite sample size
differences. Average PMM and IC are much lower than published
results.
Cercopithecus: Average and standard deviation for MMC are
identical to published data; standard deviation for IC is also
identical. Average and standard deviation for PMM and average IC are
less than the published values.
The model I have chosen to replicate is:
Following the procedure outlined in the paper, I downloaded a nexus file of primate phylogeny from the 10kTrees v.3 database. Within the nexus file, in TextEdit, I changed the date of divergence for Presbytis rubicunda to be 1.3 million years ago since this data was not included (this is identical to the procedure done in the paper).
primate.tree <- ape::read.nexus("datarep_phylogeny.nex") # import in nexus tree
ggtree(primate.tree) + geom_nodepoint() + # construct ggtree and label nodes
geom_text(aes(label = node, hjust = -.4)) +
geom_tiplab(aes(vjust = -1, hjust = 1)) +
geom_cladelabel(node = 10, label = "",
color = "green4", offset = 1, align = F) +
geom_cladelabel(node = 9, label="",
color = "steelblue3", offset = 1, align = T) +
geom_cladelabel(node = 1, label="",
color = "red4", offset = 1, align = T)
For aov.phylo, we look at genus level differences; however, our data must be subset by species. The following chunk gives summary statistics but groups by species, results for m1 are printed.
m1areasummary <- m1area %>% filter(is.na(areaM1) == F) %>% group_by(species) %>%
summarise(avg_areaM1 = mean(areaM1), sd = sd(areaM1), n = n()); m1areasummary
## # A tibble: 6 × 4
## species avg_areaM1 sd n
## <chr> <dbl> <dbl> <int>
## 1 cercop_mit 32.4 3.18 70
## 2 col_guer 44.0 4.83 102
## 3 mac_fas 39.4 7.82 98
## 4 pap_ham 156. 22.3 80
## 5 pres_mel 32.2 2.54 69
## 6 pres_rub 30.7 1.78 76
m2areasummary <- m2area %>% filter(is.na(areaM2) == F) %>% group_by(species) %>%
summarise(avg_areaM2 = mean(areaM2), sd = sd(areaM2), n = n())
m3areasummary <- m3area %>% filter(is.na(areaM3) == F) %>% group_by(species) %>%
summarise(avg_areaM3 = mean(areaM3), sd = sd(areaM3), n = n())
MMCsummary <- MMC %>% filter(is.na(MMCvalue) == F) %>% group_by(species) %>%
summarise(avgMMC = mean(MMCvalue), sd = sd(MMCvalue), n = n())
PMMsummary <- PMM %>% filter(is.na(PMMvalue) == F) %>% group_by(species) %>%
summarise(avgPMM = mean(PMMvalue), sd = sd(PMMvalue), n = n())
ICsummary <- IC %>% filter(is.na(IC) == F) %>% group_by(species) %>%
summarise(avgIC = mean(IC), sd = sd(IC), n = n())
In order to construct the phylogenetic ANOVA, using
aov.phylo(), the data must also be cleaned as a factor
variable. Measurements of interest must also be isolated.
genus_species <- as.factor(c(rep(0, 1), rep(1, 1), rep(2, 1), rep(3, 1), rep(4, 2))) # factor variable, presbytis gets two levels
genera <- c("Cercopithecus", "Colobus", "Macaca", "Papio", "Presbytis") # creating for final table
names(genus_species) = m1areasummary$species
m1avg_area <- m1areasummary$avg_areaM1
names(m1avg_area) <- m1areasummary$species
m2avg_area <- m2areasummary$avg_areaM2
names(m2avg_area) <- m2areasummary$species
m3avg_area <- m3areasummary$avg_areaM3
names(m3avg_area) <- m3areasummary$species
avgMMC <- MMCsummary$avgMMC
names(avgMMC) <- MMCsummary$species
avgPMM <- PMMsummary$avgPMM
names(avgPMM) <- PMMsummary$species
avgIC <- ICsummary$avgIC
names(avgIC) <- ICsummary$species
Here (Intercept) is Cercopithecus, group1 is Colobus, group2 is Macaca, group3 is Papio, and group4 is Presbytis.
set.seed(812)
phyloAOV <- data.frame(genera)
m1area_phyloAOV <- aov.phylo(m1avg_area~genus_species, phy = primate.tree, nsim = 1000) # 1000 simulations
## Analysis of Variance Table
##
## Response: dat
## Df Sum-Sq Mean-Sq F-value Pr(>F) Pr(>F) given phy
## group 4 12208.3 3052.08 2704.9 0.01442 0.01898 *
## Residuals 1 1.1 1.13
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m1area_phyloAOV) # we need to extract p-values alongside F statistic p-values
##
## Call:
## lm(formula = dat ~ group)
##
## Residuals:
## cercop_mit mac_fas pap_ham col_guer pres_mel pres_rub
## -5.624e-17 7.932e-19 -2.296e-17 -2.936e-18 7.511e-01 -7.511e-01
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.3846 1.0622 30.487 0.02087 *
## group1 11.6144 1.5022 7.731 0.08189 .
## group2 7.0300 1.5022 4.680 0.13402
## group3 123.7543 1.5022 82.380 0.00773 **
## group4 -0.9197 1.3010 -0.707 0.60823
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.062 on 1 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9995
## F-statistic: 2705 on 4 and 1 DF, p-value: 0.01442
# F given phylogeny = 0.01798
phyloAOV <- phyloAOV %>% mutate(m1area = summary(m1area_phyloAOV)$coefficients[, "Pr(>|t|)"]) # isolate p values and add to data frame
m2area_phyloAOV <- aov.phylo(m2avg_area~genus_species, phy = primate.tree, nsim = 1000) # m2area
## Analysis of Variance Table
##
## Response: dat
## Df Sum-Sq Mean-Sq F-value Pr(>F) Pr(>F) given phy
## group 4 26789.4 6697.3 4515.1 0.011161 0.01499 *
## Residuals 1 1.5 1.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# summary(m2area_phyloAOV)
# F = 0.01499
m3area_phyloAOV <- aov.phylo(m3avg_area~genus_species, phy = primate.tree, nsim = 1000) # m3
## Analysis of Variance Table
##
## Response: dat
## Df Sum-Sq Mean-Sq F-value Pr(>F) Pr(>F) given phy
## group 4 29833 7458.2 885160 0.00079717 0.002997 **
## Residuals 1 0 0.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# summary(m3area_phyloAOV)
# F = 0.000999
MMC_phyloAOV <- aov.phylo(avgMMC~genus_species, phy = primate.tree, nsim = 1000) # MMC
## Analysis of Variance Table
##
## Response: dat
## Df Sum-Sq Mean-Sq F-value Pr(>F) Pr(>F) given phy
## group 4 0.058524 0.014631 39.546 0.11864 0.1528
## Residuals 1 0.000370 0.000370
# summary(MMC_phyloAOV)
# F = 0.1499
PMM_phyloAOV <- aov.phylo(avgPMM~genus_species, phy = primate.tree, nsim = 1000) # PMM
## Analysis of Variance Table
##
## Response: dat
## Df Sum-Sq Mean-Sq F-value Pr(>F) Pr(>F) given phy
## group 4 0.39849 0.099622 115611 0.0022058 0.004995 **
## Residuals 1 0.00000 0.000001
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# summary(PMM_phyloAOV)
# F = 0.002997
IC_phyloAOV <- aov.phylo(avgIC~genus_species, phy = primate.tree, nsim = 1000) # IC
## Analysis of Variance Table
##
## Response: dat
## Df Sum-Sq Mean-Sq F-value Pr(>F) Pr(>F) given phy
## group 4 0.20825 0.052062 59.191 0.097142 0.1039
## Residuals 1 0.00088 0.000880
# summary(IC_phyloAOV)
# F = 0.0969
phyloAOV <- phyloAOV %>% mutate(m2area = summary(m2area_phyloAOV)$coefficients[, "Pr(>|t|)"],
m3area = summary(m3area_phyloAOV)$coefficients[, "Pr(>|t|)"],
IC = summary(IC_phyloAOV)$coefficients[, "Pr(>|t|)"],
MMC = summary(MMC_phyloAOV)$coefficients[, "Pr(>|t|)"],
PMM = summary(PMM_phyloAOV)$coefficients[, "Pr(>|t|)"])
# add all to data frame
kable(phyloAOV,
caption = "Phylogenetic ANOVA p-values for genus level effects on dental traits")
| genera | m1area | m2area | m3area | IC | MMC | PMM |
|---|---|---|---|---|---|---|
| Cercopithecus | 0.0208742 | 0.0196900 | 0.0019972 | 0.0211700 | 0.0129109 | 0.0004338 |
| Colobus | 0.0818876 | 0.0783831 | 0.0040381 | 0.1101072 | 0.1324621 | 0.0035875 |
| Macaca | 0.1340233 | 0.0933827 | 0.0049449 | 0.0901360 | 0.1016938 | 0.0127361 |
| Papio | 0.0077275 | 0.0060755 | 0.0004238 | 0.0523155 | 0.0628999 | 0.0020100 |
| Presbytis | 0.6082328 | 0.1415747 | 0.1151264 | 0.6711180 | 0.5403572 | 0.0022049 |
f_statistics <- as.data.frame(matrix(nrow = 1, ncol = 6))
names(f_statistics) = c("m1area", "m2area", "m3area", "MMC", "PMM", "IC")
f_statistics[1,] = c(0.01798, 0.01499, 0.000999, 0.1499, 0.002997, 0.0969)
kable(f_statistics,
caption = "Phylogenetic ANOVA p-values for each dental trait")
| m1area | m2area | m3area | MMC | PMM | IC |
|---|---|---|---|---|---|
| 0.01798 | 0.01499 | 0.000999 | 0.1499 | 0.002997 | 0.0969 |
Comparing against published data:
P-values evidently differ between the published data and the subset
provided; differences in statistical conclusions for each genus are
reported:
Presbytis: M1Area, M2Area, M3Area are all insignificant in the
subset. IC and MMC are consistent. PMM is significant (p=0.002).
Papio: IC and MMC are both insignificant in the subsetted
ANOVA. All other traits are significant.
Macaca: M3Area (p=0.0049) and PMM (p=0.013) are significant.
All other traits are insignificant.
Colobus: M3Area (p=0.0040) and PMM (p=0.0036) are significant.
All other traits are insignificant.
Cercopithecus: All traits are significant, including PMM
(p=0.00043).
With a reduced phylogeny, it makes sense that p-values would differ
radically. For instance, variation in other extant genera illustrating
Presbytis’ relative significant M1,M2, and M3Areas is omitted
in this reduced dataset. The divergence in results should not be
concerning but rather illustrates the effects of including/excluding
other groups in an ANOVA.
Regarding differences in trait values across these five genera, IC and
MMC are insignificant in the subsetted dataset. This result could be a
result of excluding other groups (and more variation) or could be
indicative of less variation in these traits among these five
genera.
Plot I intend to replicate:
Ancestral state reconstruction are estimated values for a trait (MMC and PMM here) at specified tree nodes (nodes 7-11, see above tree).
contMap(primate.tree, avgMMC)
ancestralMMC <- fastAnc(primate.tree, avgMMC)
contMap(primate.tree, avgPMM)
ancestralPMM <- fastAnc(primate.tree, avgPMM) # differences in cercopithicus and colobus in MMC/PMM are very interesting here!
ancestral_nodes <- data.frame(ASR_MMC = ancestralMMC, ASR_PMM = ancestralPMM)
The paper assigns ASR estimations MMC and PMM accordingly to ancestral fossils based on phylogeny and plausible ASR values. Because only some of the extant species are included in the available data, I attempted to match as closely as possible to the original data set.
ancestral_fossils <- data.frame(Fossil_rep = c("Victoriapithecus", "Procercocebus", "Soromandrillus",
"Parapapio", "Pliopapio", "Paracolobus", "Cercopithecoides", "Kuseracolobus", "Libypithecus", "cf. Chlorocebus", "Colobus sp."),
tribe = c("", "Papionini", "Papionini", "Papionini", "Papionini", "Colobinae", "Colobinae", "Colobinae", "Colobinae", "Cercopithicini", "Colobinae"),
MMC = c(1.03, 1.13, 1.28, 1.18, 1.20, 1.16, 1.18, 1.17, 1.14, 0.98, 1.06),
PMM = c(1.59, 1.62, 1.78, 1.76, 1.72, 1.50, 1.67, 1.75, 1.41, 1.56, 1.44)
)
# note that these nodes are being assigned based on the tree in paper alongside how well ASR values match
ancestral_nodes <- ancestral_nodes %>%
# concrete matching is outside the scope of this replication, matching ASR values for sake of ggplot()!
mutate(closest_fossil_rep =
c("Victoriapithecus", "cf. Chlorocebus", "Procercocebus", "Libypithecus", "Paracolobus"),
tribe = c("", "Cercopithicini", "Papionini", "Colobinae", "Colobinae"),
nodes = c("N7", "N8", "N9", "N10", "N11"))
I am trying to recreate the ggplot in the paper as best I can.
library(curl) # failed to compile (because of BiocManager?), loading curl just in case
## Warning: package 'curl' was built under R version 4.3.1
## Using libcurl 8.1.2 with LibreSSL/3.3.6
##
## Attaching package: 'curl'
## The following object is masked from 'package:readr':
##
## parse_date
# note I have an earlier version of this document pushed to GitHub that tried to remove the fossil genera from the x-axis using patchwork and gridExtra, was not a success
MMC_ggplot <- ancestral_fossils %>% ggplot(aes(x = Fossil_rep, y = MMC)) +
geom_text(aes(label = Fossil_rep, angle = 90), size = 3, position = position_nudge(y = 0.12)) +
ylim(0.75, 1.5) +
geom_point(position = position_dodge(preserve = "single")) +
geom_image(image = "https://upload.wikimedia.org/wikipedia/commons/thumb/7/75/Australopithecus_Skull.png/652px-Australopithecus_Skull.png") +
geom_boxplot(data = ancestral_nodes, aes(x = closest_fossil_rep, y = ASR_MMC, color = nodes)) +
# using a boxplot to delineate ASR states
geom_boxplot(data = dental_measurements, aes(x = genus, y = MMC, fill = tribe)) +
facet_grid(.~tribe, scales = "free_x", space = "free") +
theme(axis.title.x = element_blank(),
panel.spacing = unit(0, "lines"),
axis.text = element_text(angle = 90, vjust = 0.5, hjust=1),
panel.background = element_rect(color = "black")) +
guides(fill = F); MMC_ggplot # guides removes the tribe coloring in legend
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 91 rows containing non-finite values (`stat_boxplot()`).
## Warning: Width not defined
## ℹ Set with `position_dodge(width = ...)`
## Warning in max(table(panel$xmin)): no non-missing arguments to max; returning
## -Inf
## Warning in max(table(panel$xmin)): no non-missing arguments to max; returning
## -Inf
## Warning in max(table(panel$xmin)): no non-missing arguments to max; returning
## -Inf
## Warning in max(table(panel$xmin)): no non-missing arguments to max; returning
## -Inf
PMM_ggplot <- ancestral_fossils %>% ggplot(aes(x = Fossil_rep, y = PMM)) +
geom_text(aes(label = Fossil_rep, angle = 90), size = 3, position = position_nudge(y = 0.12)) +
ylim(1, 2) +
geom_point(position = position_dodge(preserve = "single")) +
geom_image(image = "https://upload.wikimedia.org/wikipedia/commons/thumb/7/75/Australopithecus_Skull.png/652px-Australopithecus_Skull.png") +
geom_boxplot(data = ancestral_nodes, aes(x = closest_fossil_rep, y = ASR_PMM, color = nodes)) +
geom_boxplot(data = dental_measurements, aes(x = genus, y = PMM, fill = tribe)) +
facet_grid(.~tribe, scales = "free_x", space = "free") +
theme(axis.title.x = element_blank(),
panel.spacing = unit(0, "lines"),
axis.text = element_text(angle = 90, vjust = 0.5, hjust=1),
panel.background = element_rect(color = "black")) +
guides(fill = F); PMM_ggplot
## Warning: Removed 208 rows containing non-finite values (`stat_boxplot()`).
## Warning: Width not defined
## ℹ Set with `position_dodge(width = ...)`
## Warning in max(table(panel$xmin)): no non-missing arguments to max; returning
## -Inf
## Warning in max(table(panel$xmin)): no non-missing arguments to max; returning
## -Inf
## Warning in max(table(panel$xmin)): no non-missing arguments to max; returning
## -Inf
## Warning in max(table(panel$xmin)): no non-missing arguments to max; returning
## -Inf
The descriptive statistics I provide illustrate differences between
the two datasets; my estimates are generally a bit lower than those
provided in the paper. My phylogenetic ANOVA differs from the paper
however. Across cercopithecids, I found that all traits except IC and
MMC vary significantly across taxa. Significance in a trait, on a genus
level, is associated with differentiation of a trait relative to other
taxa. Like the paper, I found Papio to be highly significant
for most traits (indicating divergence); however, MMC and IC are
insignificant. Perhaps (when omitting other species), these traits in
Papio appear similar to Macaca when ran through the
ANOVA. Surprisingly, Presbytis was highly insignificant in
molar area compared to the paper. The authors hypothesize morphological
differences in Presbytis are unexplained by G:P mapped traits;
however, my results show there is no difference in both molar mophology
nor in G:P mapped traits. My inconsistency is probably a result of the
missing data, maybe including Nasalis (alongside other
colobines) would key into differences. Another major difference between
the two analyses worth noting is the significance of M3Area in
Macaca. Maybe Macaca appears more distinct when
removing other papionins and cercopithicins.
The boxplot appears relatively similar to that in the paper. I was
unable to replicate it perfectly (e.g., I’m unsure which aesthetics are
being used for the ancestral nodes in the paper), but I was able to
replicate to the best of my ability. I included nodes on in a separate
legend to keep the plot itself less cluttered. Concerning ASR, like the
paper my ASR values are noticeably smaller than the fossil genera.
The authors find support for Hypothesis 1, showing high heritability of
G:P mapped traits (not part of my replication). As seen in my
replication, MMC and IC are consistent, in line with the notion that
these traits have a common genetic underpinning to captured variation.
ASR values are skewed heavily (moreso in my data) by extant genera (“the
tyranny of the present”). Because of this discrepancy in ASR values, I
find support for Hypothesis 2.
Through this data replication, I learned more about the R environment
alongside tidyverse (e.g., theme_bw() breaks
the plot). Although my dataset is incomplete relative to that of the
paper, I am convinced of the statistics the authors use as well as their
conclusions. Even with a smaller dataset, and the limitations
associated, I’ve found agreement with the authors’ conclusions. I gained
an appreciation of the sensitivity of ANOVAs to including/excluding
groups; while I’m not one to rely heavily on ANOVAs in general, I think
I’ll be using GLMMs (when looking at variation in a continuous trait in
accordance with categorical groups) in the future as a result.
Grieco, Theresa M.; Rizk, Oliver T.; Hlusko, Leslea J. (2012). Data
from: A modular framework characterizes micro- and macroevolution of Old
World monkey dentitions [Dataset]. Dryad. https://doi.org/10.5061/dryad.693j8
Hlusko, L.J.; Schmitt, C.A.; Monson, T.A.; Brasil, M.F.; Mahaney, M.C.
The Integration of quantitative genetics, paleontology, and neontology
reveals genetic underpinnings of primate dental evolution. Proc. Natl.
Acad. Sci. USA 2016, 113, 9262–9267. [CrossRef]
Monson, T.A.; Brasil, M.F.; Mahaney, M.C.; Schmitt, C.A.; Taylor, C.E.;
Hlusko, L.J. Keeping 21st Century Paleontology Grounded: Quantitative
Genetic Analyses and Ancestral State Reconstruction Re-Emphasize the
Essentiality of Fossils. Biology 2022, 11, 1218. https://doi.org/10.3390/
biology11081218
Frost, S.R. Fossil Cercopithecidae from the Afar Depression,
Ethiopia: Species Systematics and Comparison to the Turkana Basin.
Doctoral Dissertation, City University of New York, New York, NY, USA,
2001.
Frost, S.R.; Alemseged, Z. Middle Pleistocene fossil Cercopithecidae
from Asbole, Afar Region, Ethiopia. J. Hum. Evol. 2007, 53, 227–259.
[CrossRef] [PubMed]
Freedman, L. The fossil Cercopithecoidea of South Africa. Ann.
Transvaal Mus. 1957, 23, 121–262.
Frost, S.R.; Haile-Selassie, Y.; Hlusko, L.J. Chapter 6 Cercopithecidae.
In Ardipithecus kadabba: Late Miocene Evidence from Middle Awash,
Ethiopia; Haile-Selassie, Y., WoldeGabriel, G., Eds.; University of
California Press: Berkeley, CA, USA, 2009.
Frost, S.R.; Jablonski, N.G.; Haile-Selassie, Y. Early Pliocene
Cercopithecidae from Woranso-Mille (Central Afar, Ethiopia) and the
origins of the Theropithecus oswaldi lineage. J. Hum. Evol. 2014, 76,
39–53. [CrossRef] [PubMed]
Frost, S.R. Fossil Cercopithecidae from the Middle Pleistocene Dawaitoli
Formation, Middle Awash Valley, Afar Region, Ethiopia. Am. J. Phys.
Anthropol. 2007, 134, 460–471. [CrossRef]
Frost, S.R.; Marcus, L.F.; Bookstein, F.L.; Reddy, D.P.; Delson, E.
Cranial allometry, phylogeography, and systematics of large- bodied
papionins (Primates: Cercopithecinae) inferred from geometric
morphometric analysis of landmark data. Anat. Rec. 2003, 275A,
1048–1072. [CrossRef]
Hlusko, L.J. A new large Pliocene colobine species (Mammalia: Primates)
from Asa Issie, Ethiopia. Geobios 2006, 39, 57–69. [CrossRef]
Hlusko, L.J. A new late Miocene species of Paracolobus and other
Cercopithecoidea (Mammalia: Primates) fossils from Lemudong’o, Kenya.
Kirtlandia 2007, 56, 72–85.
Benefit, B.K. The permanent dentition and phylogenetic position of
Victoriapithecus from Maboko Island, Kenya. J. Hum. Evol. 1993, 25,
83–172. [CrossRef]